-
Notifications
You must be signed in to change notification settings - Fork 455
feature: deferred loading and requirement pruning #1199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…t checks via plugin cache
] | ||
plugin_replicate = [ | ||
"replicate>=0.8.3", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding to allows for pip install garak[plugin_all]
:
plugin_all = garak[
plugin_cohere,
plugin_langchain,
plugin_llava,
plugin_litellm,
plugin_mistralai,
plugin_nemoguardrails,
plugin_nemollm,
plugin_octoai,
plugin_ollama,
plugin_replicate,
]
There might be a better name to use for this group.
requirements.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep requirements.txt
as a consolidated lists of all possible requirements and require usage of pyproject.toml
to get a limited dependencies install?
Another option to me would be simply remove requirements.txt
and require all installs be based on using pyproject.toml
until we implement a lock file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. After migrating the gh workflows to use pyproject.toml, requirements.txt becomes superfluous - I'm leaning toward the latter.
Removal does mean we lose a cross check in test_reqs, but that's OK. Could do with a test to ensure all packages (including optional) specity a version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall! Just a handful of comments throughout.
# check cache for optional imports | ||
if category in PLUGIN_TYPES: | ||
extra_dependency_names = PluginCache.instance()[category][full_plugin_name][ | ||
"extra_dependency_names" | ||
] | ||
if len(extra_dependency_names) > 0: | ||
for dependency_module_name in extra_dependency_names: | ||
for dependency_path in [ # support both plain names and also multi-point names e.g. langchain.llms | ||
".".join(dependency_module_name.split(".")[: n + 1]) | ||
for n in range(dependency_module_name.count(".") + 1) | ||
]: | ||
if importlib.util.find_spec(dependency_path) is None: | ||
_import_failed(dependency_path, full_plugin_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the best way to do this? Perhaps we just enforce lazy loading throughout instead? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess that is what we're doing. This is the hazard of doing code reviews linearly, I suppose.
def _load_deps(self): | ||
# load external dependencies. should be invoked at construction and | ||
# in _client_load (if used) | ||
for extra_dependency in self.extra_dependency_names: | ||
extra_dep_name = extra_dependency.replace(".", "_").replace("-", "_") | ||
if ( | ||
not hasattr(self, extra_dep_name) | ||
or getattr(self, extra_dep_name) is None | ||
): | ||
setattr( | ||
self, | ||
extra_dep_name, | ||
garak._plugins.load_optional_module(extra_dependency), | ||
) | ||
|
||
def _clear_deps(self): | ||
# unload external dependencies from class. should be invoked before | ||
# serialisation, esp. in _clear_client (if used) | ||
for extra_dependency in self.extra_dependency_names: | ||
extra_dep_name = extra_dependency.replace(".", "_") | ||
setattr(self, extra_dep_name, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in Configurable
instead, since it can/should be used across all base classes?
@@ -158,19 +159,15 @@ class OptimumPipeline(Pipeline, HFCompatible): | |||
generator_family_name = "NVIDIA Optimum Hugging Face 🤗 pipeline" | |||
supports_multiple_generations = True | |||
doc_uri = "https://huggingface.co/blog/optimum-nvidia" | |||
extra_dependency_names = ["optimum-nvidia"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note that has little to do with this PR: it does drive me a bit nuts that the dependency name and the import statement so often do not match.
def _load_client(self): | ||
self._load_deps() | ||
if hasattr(self, "generator") and self.generator is not None: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of DRYness, I notice this exact code repeated across a number of the HFCompatible
classes; perhaps we should refactor into HFCompatible
?
@@ -53,14 +50,7 @@ def __init__(self, name="", config_root=_config): | |||
|
|||
super().__init__(self.name, config_root=config_root) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a call to self._load_deps()
?
@@ -105,6 +96,7 @@ class LiteLLMGenerator(Generator): | |||
"skip_seq_start", | |||
"skip_seq_end", | |||
"stop", | |||
"verbose", | |||
) | |||
|
|||
def __init__(self, name: str = "", generations: int = 10, config_root=_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a call to self._load_deps()
?
@@ -37,6 +36,7 @@ class NeMoGenerator(Generator): | |||
|
|||
supports_multiple_generations = False | |||
generator_family_name = "NeMo" | |||
extra_dependency_names = ["nemollm"] | |||
|
|||
def __init__(self, name=None, config_root=_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a call to self._load_deps()
?
I keep asking this but I may be missing some logic that's inherited, so my question is very sincere.
|
||
def __init__(self, name="", config_root=_config): | ||
from octoai.client import Client | ||
|
||
self.name = name | ||
self._load_config(config_root) | ||
self.fullname = f"{self.generator_family_name} {self.name}" | ||
|
||
super().__init__(self.name, config_root=config_root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a call to self._load_deps()
?
@@ -28,17 +32,18 @@ class OllamaGenerator(Generator): | |||
active = True | |||
generator_family_name = "Ollama" | |||
parallel_capable = False | |||
extra_dependency_names = ["ollama"] | |||
|
|||
def __init__(self, name="", config_root=_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing a call to self._load_deps()
?
I'm going to stop asking every time and assume I've missed some inherited logic.
@@ -158,6 +158,7 @@ def __setstate__(self, d) -> object: | |||
def _load_client(self): | |||
# When extending `OpenAICompatible` this method is a likely location for target application specific | |||
# customization and must populate self.generator with an openai api compliant object | |||
self._load_deps() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to have added the openai
dependency as extra_dependency_names
. Is that deliberate?
Support construction-time loading of optional modules. Includes
pyproject.toml
options and pruned fromrequirements.txt
_load_deps
and_clear_deps
pattern, used in generator constructor and_load_client
/_clear_client
Todo / in scope:
garak._plugins
? How - assign func member? create mixin?)cohere
langchain
litellm
mistral
nemoguardrails
nemollm
octo
ollama
optimum
(huggingface.OptimumPipeline) - exception fires, pkg install seems borkedreplicate
Not done:
generators
Out of scope:
pyproject.toml
Resolves #101